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Abstract Interest is in evaluating, by Markov chain Monte Carlo (MCMC) 
simulation, the expected value of a function with respect to a, possibly unnor- 
malized, probability distribution. A general purpose variance reduction tech- 
nique for the MCMC estimator, based on the zero-variance principle intro- 
duced in the physics literature, is proposed. Conditions for asymptotic unbi- 
asedness of the zero- variance estimator are derived. A central limit theorem is 
also proved under regularity conditions. The potential of the idea is illustrated 
with real applications to probit, logit and GARCH Bayesian models. For all 
these models, a central limit theorem and unbiasedness for the zero-variance 
estimator arc proved (see the supplementary material available on-line). 

Keywords Control variates • GARCH models • Logistic regression; • 
Metropolis-Hastings algorithm • Variance reduction 



1 General idea 

The expected value of a function / with respect to a, possibly unnormalized, 
probability distribution tt, 

fJ,f = J /(x)7r(x)(ix/ / 7r(x)dx is to be evaluated. Markov chain Monte 
Carlo (MCMC) methods estimate integrals using a large but finite set of points, 
x%i = 1, • • • , A^, collected along the sample path of an ergodic Markov chain 
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having tt (normalized) as its unique stationary and limiting distribution fif = 

In this paper a general method is suggested to reduce the MCMC error by 
replacing / with a different function, /, obtained by properly re-normalizing 
/. The function / is constructed so that its expectation, under tt, equals /i/, 
but its variance with respect to tt is much smaller. To this aim, a standard 
variance reduction technique introduced for Monte Carlo (MC) simulation, 
known as control variates |39) . is exploited. 

In the rest of this section we briefly explain the zero- variance (ZV) principle 
introduced in 4, 5i: an almost automatic method to construct control variates 
for MC simulation, in which an operator, _ff , acting as a map from functions 
to functions, and a trial function, ■0, are introduced. 

In quantum mechanics, a commonly used operator H is the so-called Hamil- 
tonian, which represents the total energy of the system, that is, the sum of the 
kynetic energy and the potential energy, where the kinetic energy is typically 
defined as a second-order differential operator. Such operator is Hermitian 
(that is, self-adjoint) if it acts on the restricted class of infinitely differentiable 
functions with compact support. If the trial function ijj belongs to this class, 
and if 

Hy^ = (1) 
the re-normalized function defined as 

/(x) = /(x) + (2) 

satisfies = fi^: thus both / and / can be used to estimate the desired 
quantity via Monte Carlo or MCMC simulation. However, for general ■0 the 
condition fif = iij may not hold anymore and ad-hoc assumptions on the 
target tt are necessary: this issue will be further discussed in Section [5] 

Inspired by this physical setting, as a general framework H is supposed 
to be a Hermitian operator (self-adjoint and real in all practical applications) 
satisfying ([!]), and the re-normalized function is defined as in ([2]): depending 
on the specific choices of H and ip, the condition fif = ii^ has to be carefully 
verified. 

Only a few operators will be considered in the paper, the key one being 
the Hamiltonian differential operator. An other important example discussed 
below is the Markov operator H acting as Hip(x.) = J K{'K,y)ip{y)dy, where 
if (x, y) needs to be symmetric. The re-normalized function, in this case, be- 
comes 

/>) = /(x) + ^^^^^^^44^. (3) 

and the condition — holds as a simple consequence of ([T]). 

Regardless of the specific choice of the operator and of the trial function, 
the optimal pair {H^iji), i.e. the one that leads to zero variance, can be ob- 
tained by imposing that / is constant and equal to its average, / = /i/, which 
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is equivalent to require that cr'^{f) — 0, where (t^(-) denotes the variance op- 
erator with respect to the target tt. The latter, together with ([2|, leads to the 
fundamental equation: 

i/^ = -v/^[/(x) (4) 

In most practical applications equation Q cannot be solved exactly, still, we 
propose to find an approximate solution in the following way. First choose 
a Hermitian operator H verifying ([!]). Second, parametrize ip and derive the 
optimal parameters by minimizing a^{f). The optimal parameters are then 
estimated using a first short MCMC simulation. Finally, a much longer MCMC 
simulation is performed using fij instead of fif as the estimator. This final 
estimator will be called Zero Variance (ZV) estimator through the paper. 

Other research lines aim at reducing the asymptotic variance of MCMC 
estimators by modifying the transition kernel of the Markov chain. These mod- 
ifications have been achieved in many different ways, for example by trying to 
induce negative correlation along the chain path ( [6l [2TllT3ll4n[T2] ) : by trying 
to avoid random walk behavior via successive over-relaxation ( [T|[36lf7j ) ; by 
hybrid Monte Carlo f pHl351ll0l[TSlE7] ): by exploiting non reversible Markov 
chains ( [TOIIS^] '). by delaying rejection in Metropolis-Hastings type algorithms 
f[i51[^). by data augmentation ^W^) and auxiliary variables f pi[^[551 
[33]). Up to our knowledge, the only other research line that uses control vari- 
ates in MCMC estimation follows the PhD thesis by [23] and has its most 
recent developement in |14j . In |25j it is observed that, for any real- valued 
function g defined on the state space of a Markov chain {X"}, the one-step 
conditional expectation U{x) :— g(x) — E[(7(X"+^)|X" = x] has zero mean 
with respect to the stationary distribution of the chain and can thus be used 
as control variate. The Authors also note that the best choice for the function g 
is the solution of the associated Poisson equation which can rarely be obtained 
analytically but can be approximated in specific settings. In |14j . the use of 
this type of control variates is further explored in the setting of reversible 
Markov chains were a closed form expression for U is often available. 

In |31[5] unbiasedness and existence of a central limit theorem (CLT) for the 
ZV estimator are not discussed, neither in [28j . where this estimator is applied 
to a toy example. The main contributions of this paper are, on the one hand, to 
derive the rigorous conditions for unbiasedness and CLT for the ZV estimators 
in MCMC simulation. On the other hand, we apply the ZV principle to some 
widely used models (probit, logit, and GARCH) and demonstrate that, under 
very mild restrictions, the necessary conditions for unbiasedness and CLT are 
verified. 

2 Choice of H 

In this section guidelines to choose the operator H, both for discrete and 
continuous settings, are given. In a discrete state space, denote with P(x, y) a 
transition matrix reversible with respect to tt (a Markov chain will be identified 
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with the corresponding transition matrix or kernel) . We restrict our attention 
in this section to operators H acting as Hf := J2y y)/(y). The following 
choice 

^(x, y) = \/^[^(^' y) - ^(^ - y)] (5) 

satisfies condition ([I]), where 6{x — y) is the Dirac delta function: (5(x — y) = 1 
if X = y and zero otherwise. It should be noted that the reversibility condition 
imposed on the Markov chain is essential in order to have a symmetric operator 
if(x, y), as required. 

With this choice of H, letting — tp/y/TT, equation ^ becomes: 

/(x) = /(x) - J2 Pi^' y)[V^W - ^(y)]- 



The same H can also be applied in continuous settings. In this case, P is the 
kernel of the Markov chain and equation ([s]) can be trivially extended. This 
choice of H is exploited in [T3], where the following fundamental equation is 
found for the optimal ip: E[V'(xi)|xo = x] — ipix) = fif — /(x). It is easy to 
prove that this equation coincides with our fundamental equation Q , with the 
choice of H given in ([s]). The Authors observe that the optimal trial function 
is given by 

oo 

V;(x) = 5^[E[/(x„)|xo=x]-m;], (6) 

Tl = 

that is, tjj is the solution to the Poisson equation for /(x). However, an explicit 
solution cannot be obtained in general. 

Another operator is proposed in 1^: if x e M'' consider the Schrodinger- 
type Hamiltonian operator: 

i—l * 

where V(x) is constructed to fulfill equation (jlj): V = ^^A^/n and A denotes 
the Laplacian operator of second order derivatives. In this setting, we obtain 
the general expression for / reported in ([2]), where now H is the Schrodingcr- 
type Hamiltonian. These are the operator and the re-normalized function that 
will be considered throughout this paper. Although it can only be applied to 
continuous state spaces, this Schrodinger-type operator shows several advan- 
tages with respect to the operator ([5). First of all, in order to use Q the 
conditional expectation appearing in (6]) has to be available in closed form. 
Secondly, definition Q does not require reversibility of the Markov chain. 
Moreover, this definition is independent of the kernel P(x, y) and, therefore, 
also of the type of MCMC algorithm that is used in the simulation. Note that, 
for calculating / both with the operator Q and ([sj, the normalizing constant 
of TT is not needed. 
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3 Choice of xp 

The optimal choice of -0 is the exact solution of the fundamental equation Q . 
In real applications, typically, only approximate solutions, obtained by min- 
imizing cr^(/), are available. In other words, we select a functional form for 
V', parameterized by some coefhcients of a class of polynomials, and optimize 
those coefficients by minimizing the fluctuations of the resulting /. The par- 
ticular form of is very dependent on the problem at hand, that is on tt, and 
on /. In the sequel it will be assumed that ip = Pyp^^ where P is a polynomial. 
As one would expect, the higher is the degree of the polynomial, the higher 
is the number of control variates introduced and the higher is the variance 
reduction achieved. It can be easily shown that in a d dimensional space, us- 
ing polynomials of order p, provides ('^^^) — 1 control variates. However, some 
restrictions on the coefficients may occur in order to get an unbiased MCMC 
estimator. See Example [l] of Section [5] at this regard. 



4 Control Variates and optimal coefficients 

In this section, general expressions for the control variates in the ZV method 
are derived. Using the Schrodinger-type Hamiltonian iJ as given in ([t]) and 
trial function V'(x) = P(x)-\/ 7r(x), the re-normalized function is: 

/(x) = /(x)-iz\P(x) + VF(x).z, (8) 

where z = — iVln7r(x), V = ^g^, denotes the gradient and A = 

Eti&- Like any other control variate (i.e. zero mean random variables 
under the distribution of interest), the variable z can be monitored to test 
convergence along the lines suggested by and [38] , where the same control 
variate z — VlogTr is used. 

Hereafter the function P is assumed to be a polynomial. As a first case, 
for P(x) = ^j^j (l^t degree polynomial), one gets: 

/(x) = /(x) + ^ = /(x)+a^z. 
V'r(x) 

The optimal choice of a, that minimizes the variance of f{x), is: 

a=-Z'-,V(z,/), where IJ,,^E{zz'^), a(z, /) = E(z/). 

For a more general approach to the choice of coefficients using control variates, 
reference should be made to ^^'^ EO] ■ We anticipate that conditions under 
which the ZV-MCMC estimator obeys a CLT (Section [5| guarantee that the 
optimal a is well defined. In ZV-MCMC, the optimal a is estimated in a first 
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stage, through a short MCMC simulatior[^ When higher-degree polynomials 
are considered, a similar formula for the coefficients associated to the control 
variates is obtained once an explicit formula for the control variate vector z has 
been found. As an example, for quadratic polynomials P(x) = a^x+ ^x^Bx, 
the re-normalized / is : 

/(x) = /(x) - ^tr(i3) + (a + Bxfz. 

Using second order polynomials yields a vector of control variates of dimension 
^d{d+3). Therefore, finding the optimal coefficients requires working with Ezz 
which is a matrix of dimension of orderd^ . This makes the use of second order 
polynomials computationally expensive when dealing with high-dimensional 
sampling spaces, say of the order of decades. 



5 Unbiasedness and central limit theorem 

As remarked in Section [l] condition ([ij may not be sufficient to ensure unbi- 
asedness of the estimator when the Schrodinger operator ^ is used. In this 
section general conditionys on the target tt are provided that guarantee that 
the ZV-MCMC estimator is (asymptotically) unbiased for the class of trial 
functions discussed. Details can be found in the on-line supplementary mate- 
rial. Appendix D. 

Proposition 1 Let tt be a d- dimensional density on a bounded open set Q 
with regular boundary dQ, whose first and second derivatives are continuous. 
Then, if ip = P^/n, a sufficient condition for unbiasedness of the ZV-MCMC 
estimator is tt (x)^^ = 0, for all x e 912, j = 1, . . . , d. 

The previous proposition is a consequence of multidimensional integration by 
parts, from which one gets the equality 

/ [ipVy^ - y^VV-] • ndcr, (9) 

where n denotes the versor orthogonal to df^. 

When TT has unbounded support, integration by parts cannot be used di- 
rectly. In this case, we can formulate the following result. 

Proposition 2 Let t: be a d-dimensional density with unbounded support fl, 
whose first and second derivatives are continuous, and let (-Br)r be a sequence 
of hounded subsets, so that Br fi. Then, a sufficient condition for unbiased- 
ness of the ZV-MCMC estimator is 

lim / ttVP • ndfT = 0. 

^ From a practical point of view there is no need to run two separate chains, one to get the 
control variates and one to get the final ZV estimator: everything can be done on a single 
Markov chain which is run once to estimate the optimal coefficients of the control variates 
and then post-processed to get the ZV estimator. 
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In the univariate case, if f2 is some interval of the real line, that is, SI = {l,u) 
where m, Z € M U ±00, it is sufficient that 



dP{x) 



dx 



nil) 



dP{x) 



dx 



7r(w), 



(10) 



which is true, for example, if ^tt annihilates at the border of the support. 

In the seminal paper by [4] unbiasedness conditions are not clearly explored 
since, typically, the target distribution the physicists are interested in, anni- 
hilate at the border of the domain with an exponential rate. The following 
example shows how crucial the choice of trial functions is, in order to have an 
unbiased estimator, even in trivial models. 

Example 1 Let f{x) — x and tt be exponential: tt{x) ~ Ae^'^'^Ij^^Qj. If P(x) is 



a first order polynomial, ( 10 ) does not hold and this choice does not allow for 
a ZV-MCMC estimator, since the control variate ^ = — 5^ ln7r(x) is constant 
and (7(3;, z) = 0. However, to satisfy equation (10) it is sufficient to consider 
second order polynomials. Indeed, if P{x) ~ -\- aix + 022;^ equation (10) is 



satisfied provided that ai — Q and the minimization of the variance of / can 
be carried out within this special class. The optimal choice 02 := 2^ yields 



zero variance: a 



'(/)^0. 



5.1 Central limit theorem 

Conditions for existence of a CLT for fif are well known in the literature ( [44j ) . 
Using these classical results, from ([8| we have that the ZV-MCMC estimator 
obeys a CLT provided /, AP and VP • z belong to L^"'"^(7r) when the Markov 
chain run for the simulation is geometrically ergodic. In the next corollary, the 
case of linear and quadratic polynomials P (used in the examples in Section 
|6| is considered. 

Corollary 1 Let ip{x) — P(x)y^, where P(x) is a first or second degree 
polynomial. Then, the ZV-MCMC estimator jl^ is a consistent estimator of 
fif which satisfies the CLT, provided one of the following conditions holds: 

CI : The Markov chain is geometrically ergodic and f , x^Zj G L'^^^{ti), Vi,j, 

for all k G {0, deg P — 1} and some S > 0. 
C2 : The Markov chain is uniformly ergodic and f, x^Zj G L'^{tt), Vijj and 

for all k G {0,degP- 1}. 

In the case of linear P, using the definition of control variate, the statement 
of the previous corollary can be reformulated in this simple way: if / G i^(7r) 
and the chain is uniformly ergodic, then a sufficient condition to get a CLT is 

ruj = E. 



dx^ 



ln(vr(x)) 



< 00, 
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The quantity rrij is known in the literature as Linnik functional (if considered 
as a function of the target distribution, /(tt)) since it was introduced by }29j . 
The quantity rrij is also interpretable as the Fisher information of a location 
family in a frequentist setting. 



5.2 Exponential family 

Let TT belong to a d-dimensional exponential family: 7r(x) cx exp(/3 • T(x) — 
Kp{/3))p{x) , where /3 g M'* is the vector of natural parameters. The following 
theorem provides a sufficient condition for a CLT for ZV-MCMC estimators 
when the target belongs to the exponential family and a uniformly ergodic 
Markov Chain is considered. Similar results can be achieved when the Markov 
Chain is geometrically ergodic, by considering the 2 + 5 moment. This state- 
ment can be easily verified by a direct computation. 

Theorem 1 Let tt belong to an exponential family, with p such that € 
L'^{tt), yi,k. Then, the Linnik functional of t: is finite if and only if G 
L2(7r), Vi,fc. 

Example 2 The Gamma density r(a, 9) can be written as an exponential fam- 
ily on (0, -foo), where p{x) = 1, so that hypotheses of Theorem [T] are satisfied. 
A direct computation shows that the Gamma density r(a, 9) has finite Linnik 
functional for any 9 and for any a G {1} U (2, -l-oo). Under these conditions, a 
CLT holds for the ZV-MCMC estimator. 



6 Examples 

In the sequel standard statistical models are considered. For these models, 
the ZV-MCMC estimators are derived in a Bayesian context; from now on, 
the target tt = 7r(/3|x) is the Bayesian posterior distribution: therefore, the 
argument associated with the state of the Markov chain is denoted by (3 instead 
of X, which represents, now, the vector of data. The operator H considered is 
the Schrodinger-type Hamiltonian defined in ([7|, and 4' = P\/^i where P is a 
polynomial. 

Numerical simulations are provided, that confirm the effectiveness of vari- 
ance reduction achieved, by minimizing the variance of / within the class 
of trial functions considered. Moreover, conditions for both unbiasedness and 
CLT for / are verified for all the examples. For the mathematical derivation 
of the zero-variance estimator and the proofs of unbiasedness and CLT for 
the models considered, we refer the reader to the appendices of the on-line 
supplementary material (Appendices A, B and C). 
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6.1 Probit Model 



To demonstrate the effectiveness of ZV for probit models, a simple example 
is presented. The bank dataset from [T7| contains the measurements of four 
variables on 200 Swiss banknotes (100 genuine and 100 counterfeit). The four 
measured variables Xi {i ~ 1,2,3,4), are the length of the bill, the width of 
the left and the right edge, and the bottom margin width. These variables 
are used in a probit model as the regressors, and the type of the banknote 
Hi, is the response variable (0 for genuine and 1 for counterfeit). Using flat 
priors, the Bayesian estimator of each parameter, /3j,, under squared error loss 
function, is the expected value of /fc(/3) = /?fe under tt {k — 1, 2, • • • ,d). The 
Bayesian analysis of this problem is discussed in [3T]. In order to find the op- 
timal vector of parameters of the trial functions, a short Gibbs sampler, 
foUowing ([2]), (of length 2000, after 1000 burn in steps) is run, and the op- 
timal coefficients are estimated: aj; — ~IJ^J^a{z, /3k)- Finally another MCMC 
simulation of length 2000 is run (and using the estimated optimal values ob- 
tained in the previous step), along which /fc(/3), for fc = 1, . . . , 4 is averaged. 
We have repeated this experiment 100 times. The MCMC traces of the ordi- 
nary MCMC and the ZV-MCMC in one of these MOnte Carlo experiments 
have been depicted in the left plot of Fig. [l] The blue curves are the traces of 
fk (ordinary MCMC), and the red ones are the traces of fk (ZV-MCMC). It 
is clear from the figure that the variances of the estimator have substantially 
decreased. Indeed for the linear trial functions, the ratios of the Monte Carlo 
estimates of the asymptotic variances of the two estimators (ordinary MCMC 
and ZV-MCMC) are between 25 and 100. Even better performance can be 
achieved using second degree polynomials to define the trial function. In the 
right column of Fig. [T] the traces of ZV-MCMC with second order P{x) are 
reported along with the traces of the ordinary MCMC. As it can be seen from 
the figure, the variances of the ZV estimators are negligible: the ratio of the 
Monte Carlo estimates of the asymptotic variances of the two estimators are 
between 18, 000 and 90, 000. In this example (with the simulation length and 
burn-in reported above) the CPU time of ZV-MCMC is almost 3 times larger 
than the one of ordinary MCMC. 



In order to study the unbiasedness of the ZV-estimators empirically, we 
have run a very long MCMC (of length 10^) and obtained a very narrow 95% 
confidence region for each parameter. In Fig. [2] we have depicted the box-plot 
of the ordinary MCMC (first box-plot), and the ZV-estimators (second and 
third box-plot) along with these 95% confidence regions (the green regions). 
As it can be seen, the ZV-estimators are concentrated in the 95% confidence 
regions obtained from the very long chain. 
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Fig. 2 Boxplots of ordinary MCMC estimates (1) and ZV-MCMC estimates (2 and 3) for 
the probit model, along with the 95% confidence region obtained by an ordinary MCMC of 
length 10* (green regions). 
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Fig. 3 Box-plots of ordinary MCMC estimates (1) and ZV-MCMC estimates (2 and 3) for 
the logit model, along with the 95% confidence region obtained by an ordinary MCMC of 
length 10* (green regions). 



6.2 Logit Model: 

A logit model is fitted to the same dataset of Swiss banknotes previously in- 
troduced. Flat priors are used and, as before, the Bayesian estimator of each 
parameter, j3k, (again under squared error loss functions) is the expected value 
of /3fc under tt (fc = 1, 2, • • • ,d). Similar to the probit example, in the first stage 
a MCMC simulation is run, and the optimal parameters of P(/3) are estimated. 
Then, in the second stage, an independent simulation is performed, and fk is 
averaged, using the optimal trial function estimated in the first stage (the same 
simulation length and burn-in, as in the probit example, have been used). For 
linear polynomial, the ratio of the Monte Carlo estimates of the asymptotic 
variances of the two estimators (ordinary MCMC and ZV-MCMC) are between 
15 and 50. Using quadratic polynomials, these ratios are between 15, 000 and 
20,000. In this example the CPU time of the ZV-MCMC is almost 3 times 
higher than that of ordinary MCMC. 

We have run a very long MCMC (of length 10^) and obtained a very 
narrow 95% confidence region for each parameter. In Fig. [s] we have depicted 
the box-plot of the ordinary MCMC (first box-plot), and the ZV-estimators 
(second and third box-plot) along with these 95% confidence regions (the green 
regions). Again, as it can be seen, the ZV-estimators are concentrated in the 
95% confidence regions obtained from the very long Markov chain. 
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Table 1 GARCH variance reduction; 95% Confidence interval for the ratio of the variances 
of ordinary MCMC estimators and ZV-MCMC estimator. 

















LU-S 


1st Degree P{x) 


8-18 


13-28 


12-27 


2nd Degree P{x) 


1200-2700 


6100-13500 


6200-13800 


3rd Degree P{x) 


21000-47000 


48000-107000 


26000-58000 



6.3 GARCH Model 

Generalized autoregressive conditional heteroskedasticity (GARCH) models 
([5]) have become one of the most important building blocks of models in fi- 
nancial econometrics, where they are widely used to model returns. Here it 
is shown how the ZV-MCMC principle can be exploited to estimate the pa- 
rameters of a univariate GARCH model applied to daily returns of exchange 
rates in a Bayesian setting. Let S{t) be the exchange rate at time t. The daily 
returns are defined as r{t) := [S{t) - S{t-1)]/ S{t- 1) In iS{t)/S{t - 1)). In 
a Normal-GARCH model, we assume the returns are conditionally Normally 
distributed, r{t)\Tt ^ A/'(0, ht), where ht — u\ +u!3ht-i +ui2ri_i, and uji > 0, 
UJ2 > 0, and W3 > are the parameters of the model. The aim is to estimate 
the expected value of ujj under the posterior tt, using independent truncated 
normal priors. As an example, a Normal-GARCH(l, 1) is fitted to the daily 
returns of the Deutsche Mark vs British Pound exchange rates from January 
1985, to December 1987. In the first stage a short MCMC simulation ([5]) is 
used to estimate the optimal parameters of the trial function (2000 sweeps af- 
ter 1000 burn- in). In the second stage an independent simulation is run (with 
length 10000) and /fc(a;) is averaged in order to efficiently estimate the poste- 
rior mean of each parameter. We compare this ZV-MCMC with an ordinary 
MCMC of length 10000 (after 1000 burn-in). First, second and third degree 
polynomials in the trial function are used. In order to study the effectiveness of 
ZV-MCMC, we have run these simulations (ordinary MCMC and ZV-MCMC) 
100 times. As it can be seen in Table [T] where a 95% confidence interval for 
the variance reductions are reported, the ZV strategy reduces the variance of 
the estimators up to ten thousand times. In this example (with the simulation 
and burn-in lengths reported above) the CPU time of the ZV-MCMC is almost 
20% higher than the CPU time of ordinary MCMC. 

In order to study the unbiasedness of the ZV-estimators empirically, we 
have run a very long MCMC (of length lO'') and obtained a narrow 95% 
confidence region for each parameter. In Fig. [4] we have depicted the box- 
plot of the ordinary MCMC (first box-plot), and the ZV-estimators (second, 
third and fourth box-plots) along with these 95% confidence regions (the green 
regions). As it can be seen the ZV-estimators lie in the range obtained by the 
very long MCMC. 
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Fig. 4 Boxplots of ordinary MCMC estimates (1) and ZV-MCMC estimates (2, 3 and 4) for 
the GARCH model, along with the 95% confidence region obtained by an ordinary MCMC 
of length 10^ (green regions). 



Finally, note that the ZV strategy can be used in great generality and 
can be applied also to more complex GARCH models (such as E-GARCH, 
I-GARCH, Q-GARCH, GJR-GARCH, 0), provided it is possible to analiti- 
cally compute the necessary derivatives and verify the hypotheses needed for 
unbiasedness and CLT, in a way similar to the proof reported in Appendix C. 



7 Discussion 



Cross-fertilizations between physics and statistical literature have proved to 
be quite effective in the past, especially in the MCMC framework. The first 
paradigmatic example is the paper by [23] first and [19] later on. 

Besides translating into statistical terms the paper by [4], the main effort 
of our work has been the discussion of unbiasedness and convergence of the 
ZV-MCMC estimator. The study of CLT leads to the condition of finiteness for 
jg.^j^aio|^^-j2j^ This quantity has also been used in the recent paper by [50] 
as a metric tensor to improve efficiency in Langevin diffusion and Haniiltonian 
MC methods. Their idea is to choose this metric as an optimal, local tuning of 
the dynamic, which is able to take into account the intrinsic anisotropy in the 
model considered. In our understanding, what makes these methods and our 
extremely efficient, is the common strategy of exploiting information contained 
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in the derivatives of the log-target. A combination of the two strategies could 
be explored: once the derivatives of the log-target are computed, they can 
be used both to boost the performance of the Markov chain (as suggested 
by [20]) and to achieve variance reduction by using them to design control 
variates. This is particularly easy since control variates can be constructed by 
simply post-processing the Markov chain and, thus, there is no need to re-run 
the simulation. 

The second main contribution of this paper is the critical discussion of the 
selection of H and ip. A comparison between the variance reduction framework 
exploited in [13] and the choice of different operators H in our context has 
remarked contras and benefits of the two approaches. Different choices of H 
and Tp could provide alternative efficient variance reduction strategies. This 
can be easily achieved by considering a wider class of trial functions: ip{x) = 
P(x)(7(x), where, as before, -P(x) denotes a parametric class of polynomials, 
and 9(x) is an arbitrary (sufficiently regular) function. 

In the present research we have explored ip based on first, second and 
third degree polynomials. Despite the use of this fairly restrictive class of trial 
functions, the degree of variance reduction obtained in the examples in Section 
[g] and in other simulation studies (not reported here) is impressive and of the 
order of ten times (for first degree polynomials) and of thousand times (for 
higher degree polynomials), with practically small extra CPU time needed in 
the simulation. 

Finally, mention should be made to an alternative, more general renormal- 
ized function / reported in the paper by [5], defined as: 

H*_«iVi)_ (11) 

where, again, H is an Hamiltonian operator and ip a quite arbitrary trial 
function. In this setting, HH — —^A + V, under the same, mild conditions 
discussed in Section [H] / has the same expectation as / under it. This is true 
without imposing condition ([ij, so that now V can be also chosen arbitrarily. 
Therefore, the re-normalization ( [Tl] ) allows for a more general class of Hamil- 
tonians. 



Supplementary Materials 

Supplementary materials are available. In Appendices A, B and C the zero 
variance estimator and the proof of CLT are given for all the examples. In 
Appendix D computations of unbiasedness conditions discussed in Section [5] 
are reported and verified for the three examples. 
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Appendix A: Probit model 

Mathematical formulation 

Let yi be Bernoulli r.v.'s: yi|xj ~ B{l,pi), Pi = <P{xfl3), where ^ € M"^ is 
the vector of parameters of the model and is the c.d.f. of a standard normal 
distribution. The likelihood function is: 



i=l 



As it can be seen by inspection, the likelihood function is invariant under the 
transformation (xj,yj) — (— Xj, 1 — yi). Therefore, for the sake of simplicity, 
in the rest of the example we assume J/, = 1 for any i, so that the likelihood 
simplifies: 



This formula shows that the contribution of x, = is just a constant ${xjl3) = 
^(0) = 1 7 therefore, without loss of generality, we assume for all i, Xj 7^ 0. 

Using flat priors, the posterior of the model is proportional to the likeli- 
hood, and the Bayesian estimator of each parameter, /3k, is the expected value 
of /fc(/3) = f3k under tt (fc = 1, 2, • • • ,d). 

Using Schrodinger-type Hamiltonians. H and ijjk{P) — Pfe(/3)-\/7r(/3), as the 
trial functions, where Pk{P) — 'I2j=i'^j,kl3j is a first degree polynomial, one 
gets: 



/;c(/3) = /fe(/3) + = MP) + E"^-.'^^^' 



d 



where, for j = 1,2, . . . ,d, 



1^ 

2i = -o2. 



2 lit 

because of the assumption yi = 1 for any i. 
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Central limit theorem 



In the following, it is supposed that P is a linear polynomial. In the Probit 
model, the ZV-MCMC estimators obey a CLT if Zj have finite 2 + 5 moment 
under tt, for some 5 > 0: 



E 



2+6' 



C1C2 



2+5 



n'Z'(xf/3)d/3<oo. 



where ci — 2 ^ and C2 is the normalizing constant of tt (the target poste- 
rior). Define: 



Km 



2+S 



and therefore: 



n 

1=1 

=i^l(/3)i^2(/3) 



where c = C1C2. Before studying the convergence of this integral, the following 
property of the likelihood for the probit model is needed. 

Proposition 3 Existence and uniqueness of MLE implies that, for any /3q G 
M'^ \ {0}j there exists i such that xf (3o < 0. 

Proof (by contradiction). Uniqueness of MLE implies that x^x is full rank, 
that is, there is no f3o orthogonal to all observations Xj. This can be seen by 
contradiction: singularity of X"^x implies existence of a non-zero /3o orthogonal 
to all observations x^. This fact, in turn, implies /(/3|x,y) = 1{P + c/3o|x,y) 
and, therefore, Z(»|x, y) does not have a unique global maximum. 
Next, assume there exists some /3o G K"* such that, for any i, xf/3o > 0. 
Then (3o is a direction of recession for the negative log-likelihood function 
— X]r=i lii^^(xf /3) (that is a proper closed convex function). This implies that 
this function does not have non-empty bounded minimum set ( [40] ) . which 
means that the MLE does not exist. ■ 

Now, rewriting K{P)dp in hyper-spherical coordinates through the bijec- 
tive transformation (p, 6*1, ... , Od-i) := F{f3), where F^^ is defined as 
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=pcos(0i) 

< A = pcos{ei) U'~li M^m), for I = 2, d - 1 (12) 
for e e := {0 < 61, < TT, i = 1, . . . , d - 2, < Od-i < 27r} and p > 0, one gets 



< 



Je Jo 

p p+OO 

/ / K{F-\p,e))p''-' dpde 

J0 Jo 

L 



:= / A{e)d9, 

10 

Observe that the integrand is well defined for any [p, 9) on the domain of 
integration, so it is enough to study its asymptotic behaviour when p goes to 
infinity, and 0^0. 

First, analyze 

2+5 

Xij<l){\y.i\p\i{e) 



K,{F-\p,e)) 



1^ Hi^iipxm) 



where, for any i, Aj is a suitable function of the angles 6 such that A, e 
[—1,1], which takes into account the sign of the scalar product in the original 
coordinates system. 

For any i, when p — >■ oo 

-ifA,>0, ^^|g^eO(</.(A,p)); 



_ ;f \ _ n a:ij0(|xi|pAi) 
II At — U, <g(|x,|pA.) 

Therefore: 



eO{i). 



and, for any 61 e 6>: Ki{F-^{p,e)) & O {p^+'') .now,ioc\xsonK2{F-^{p,e)) = 
nr=i ^{\^i\P^i{^))\ existence of MLE for the probit model implies that, for any 
6 G 0, there exists some / {I < I < n), such that Xi{9) < 0, and therefore: 



K2{F-\p,0)) < ^{\^i\pXi) e O {<t>{Xip)) p ^ 00. 
Putting these results together leads to 

K{F-\p,9)) = Kr{F-\p,e))K2{F-\p,e)) & O {p^+' cl^{Xi{e) p)) 



(13) 
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SO that, for any G 0, 

Therefore, whenever the value 9 0, its integrand converges to zero rapidly 
enough when p — >■ +oo. This concludes the proof. 

Note 1. In ([H]) it is shown that the existence of the posterior under flat 
priors for probit and logit models is equivalent to the existence and finite- 
ness of MLE. This ensures us that the posterior is well defined in our con- 
text. In order to verify the existence of the posterior mean, we can use a 
simplified version of the proof given above. In other words we should show 
/ nr=i '^i^If^W < +00, that is, Ki{l3) = jSj G 0{p). Therefore, a weaker 
version of the proof given above can be employed. 

Note 2. In the proof given above we have used flat priors: although this 
assumption simplifies the proof, however a very similar proof can be applied for 
non-fiat priors. Assume the prior is 7ro(/3). Under this assumption the posterior 
is 7r(/3) — TTo{(3) Z(/3|y,x) oc tto{/3) n"=i^(^f/^) ^^'^ control variates are: 
Zj = ^i^^i^^ — 5 Sr=i ^'^{x^p)^ ■ Therefore we need to prove the 2 + 6-th 
moment of Zj under 7r(/3) is finite. A sufficient condition for this is the finiteness 
of 2 + (5-th moments of — ^ '""^^"^'^^ and ^'^^t'^^^ under 7r(/3). If we 

assume 7ro(/3) is bounded above, the latter is a trivial consequence of the proof 
given above for the fiat priors. Therefore we only need to prove the finiteness 
of the integral 

2+5 n 

MP) i[<p{^fmp. 

i=l 

Again if we assume the prior is bounded from above, a sufficient condition for 
the existence of this integral is the existence of the following integral: 

2+6 n 
i=l 

A proof very similar to the one given above will show that this integral is finite 
for common choices of priors 7ro(/3) (such as Normal, Student's T, etc). 



/ 



dln7ro(/3) 



d/3. 





din 7ro(/3) 


L 





Appendix B: Logit model 

Mathematical formulation 

In the same setting as the probit model, let pi = j^^^^p^^^^) ''^'^^'"'^ /? G M"^ is 
the vector of parameters of the model. The likelihood function is: 

'(^i-)=<n(rf^)"(TT^)"- <"> 
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By inspection, it is easy to verify that the likehhood function is invariant 
under the transforination:(xi, j/^) — > (— x^,! — yi). Therefore, for the sake of 
simphcity, in the sequel we assume yi — for any i, so that the hkelihood 
simphfies as: 



" 1 
1 + exp 

2—1 



exp(x//3) 

The contribution of x, = to the Ukehhood is just a constant, therefore, 
without loss of generality, it is assumed that x^ ^ for all i. Using flat priors. 



the posterior distribution is proportional to (14) and the Bayesian estimator 
of each parameter, is the expected value of fkiP) — Pk under tt {k ~ 
1,2, •• • ,d). Using the same pair of operator H and test function ipk as in 
Appendinx A, the control variates are: 



1 ^ exp(x/ P) • 1 o J 

2 ^ 1 + exp(x/ /3 



Central limit theorem 



As for the probit model, the ZV-MCMC estimators obey a CLT if the control 
variates Zj have finite 2 + S moment under tt, for some S > : 



E 



exp(xf^) 
''■'■l+exp(xf/3) 



2+S' 



C1C2 



exp(xf/3) 
1 + exp(xf /3) 



2+S 



1 

exp(xf/3) 



d/3 



where ci — 2^^^^ , and C2 is the normalizing constant of tt. The finitiness of the 
integral is, indeed, trivial. Observe that the function e^/(l + e^) is bounded 
by 1; therefore, 



2+5 



E. [.n< 



< 00. 



\i=l 



As for probit model, note that it can easily be shown that the posterior 
means exist under flat priors. Moreover, a very similar proof (see Note 2 of 
Appendix A) can be used under a normal or Student's T prior distribution. 



Appendix C: GARCH model 

Mathematical formulation 

We assume that the returns are conditionally normal distributed, r{t)\J-t ^ 
J\f{0,ht), where ht is a predictable (J-'t-i measurable process): ht = uji + 
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^^3^4-1 + '^2?'t-i, where uii > 0, UJ2 > 0, and > 0. Let r = (n, . . . ,rT) be 
the observed time series. The hkelihood function is equal to: 



t 



and using independent truncated Normal priors for the parameters, the pos- 
terior is: 



TT (wi, W2, W3|r) oc exp 



+ 



+ 



2 WiuJi) a^{uj2) a^us) 



\t=i 



exp 



Therefore, the control variates (when the trial function is a first degree poly- 
nomial) are: 



dlnn 
doJi 

where: 

i-uji 



dht 



doji 1 — W3 ' dijj2 



1 1 dht 1 



dht 



fT2(wi) 2^^ht dtJi 2 ^ h; doj. 



dht-1 



It>i, 



dht 



i = 1,2,3, 



= ht-i +u;3 



dht-i 



Central limit theorem 

In order to prove the CLT for the ZV-MCMC estimator in the Garch model, 
we need: 

^eL^+^W. (15) 

OUJ2 OUJ3 OOJi 

To this end, ht and its partial derivatives should be expressed as a function of 
ho and r: 

t-i t-i 
/it = wi(^ 1 + w^) + (^Iho + W2(^ 1 + w^r-2_i_ J, 

k=l fe=l 



9 In /i. 



t-2 



du)2 

din ht 
du)3 



* I 2 

- = ri 



1 + 11^3 ' '^1 1 ht>i}, 

t-2 

ht-i+^L^l'^^^^hj j I{t>i}. 

j=0 
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Next, moving to spherical coordinates, the integral ( 15 ) can be written as 



/ / Kj{p;e,(f>)dpd9d(f> -.^ [ Ajie,(t>)d0d<j), 

J[0,7r/2]2 Jo J[0,7r/2]2 

where, for j = 1, 2, 3, Kj{-, 9, (j)) = \Wj\'^+^ x W, with 




T 



^ = exp - sP' cos2 e sin^ + sin^ 6 sin^ cj) + cos^ - ^ J] 



X sin 



(16) 



and 



t-i 



/it = -pcos6'sin0^(l+p''' cos*" 0)+ft.op* cos* sin sin (j) (l+?'t^_i_fc|o'' cos* 
fe=i fe=i 

The aim is to prove that, for any 0,(l> G [0, 7r/2] and for any j, Aj{6, </)) is finite. 
To this end, the convergence of Aj for any 6, (j) should be discussed. 

Let us study the proper domain of Kj{-;9,(j)). Observe that Kj{-;6,(f>) is 
not defined whenever ht = and, if j 1 and (j) ^ 7r/2, also for p — 1/ cos 
However, the discontinuity of K^^^ at this point is removable, so that the domain 
of K-i can be extended by continuity also at p — 1/ cos0. 

Since /it = if and only if p = 0, it can be concluded that, for any j and for 
any 0, </) € [0, 7r/2], the proper domain oiKj{-]9, cj)) is Aom.Kj{-; 6, (p) — (O+oo). 
By fixing the value of 9 and 0, let us study the limits of Kj when p ^ and 
p — > +0O. Observe that, whatever the values of 9 and (j) are, W^ 's are rationale 
functions of p. Therefore, for any j, \Wj\'^~^^ cannot grow towards infinity more 
than polynomially at the boundary of the domain. On the other hand, W goes 
to zero with an exponential rate both when p — > and p — >■ +oo, for any 9 and 
(j>. This is sufficient to conclude that, for any 9,(j) ^ [0, tt/2], the integral Aj is 



finite for j = 1,2,3 and, therefore, condition ( 15 ) holds and the ZV estimators 
for the GARCH model obeys a CLT. 



22 



Antonictta Mira et al. 



Appendix D: unbiasedness 

In this Appendix, explicit computations are presented, which were omitted in 
Section 5. Moreover, it is proved that aU the ZV-MCMC estimators discussed 
in Section 8 are unbiased. 

FoUowing the same notations as in Section 5, equation (9) follows because 



Hi) 



J Q 2 Jqq 2 J Q 



in ^ Jon 

If If If 

Vy'TTip / y'nWip ■ nd(T ^ — / ipVy/Ti-nda / il^Ay/T: 

n 2 Jqq 2 Jqq 2 Jq 



J n 2 Jqq 

I [■(/'Va/tt — ^/nVtp] ■ nda. 

2 Jdf2 



Therefore, ( ^ ) = if VV^^ = y^VV- on dQ. Now, let ^ = Py/^- 



Then, 



/TT 



SO that ( ^ ) if 



p 

S/tP = V^VP + ——Vtt, 



7r(x)^^^^=0, VxGdn, j^l,...,d. 
dxj > J . : 

When TT has unbounded support, following the previous computations inte- 
grating over the bounded set Br and taking the limit for r — > cx), one gets 

(^)^\ Hm / TrVP-nda. (17) 

Therefore, unbiasedness in the unbounded case is reached if the limit appearing 
in the right-hand side of (171 is zero. 

Now, the unbiasedness of the ZV-MCMC estimators exploited in Section 
8 is discussed. To this end, condition (17) should be verified. Let Bp be a 
hyper-sphere of radius p and let n := be its normal versor. Then, for linear 
P, (17 1 equals zero if, for any j = 1, . . . ,d, 

lim ^ / TT {13)13. dS ^Q. (18) 

p^+oo p 



Zero Variance Markov Chain Monte Carlo for Baycsian Estimators 



23 



The Probit model is first considered. By using the same notations as in 

(19) 



Appendix A, the integral in ( 18 1 is proportional to 

1 



lim 

p->+oo p Jq 



K2{F-\p,e))p''d9. 



Note that, because of (13), there exist po a-^d M such that 

< M<i>{\g)Po)pt G{e) Vp > po- 

Since G{9) £ L^, by the dominated convergence theorem a sufficient condition 
to get unbiasedness is 



hin K2{F-\p,6))p''-^ =Q, 

p— >-+oo 



(20) 



which is true, because of (131 for the Probit model. 

We now consider the Logit model for which it is easy to prove that Propo- 
sition [3] holds. As done for the Probit model, one can write 



where 



exp(xf/ 



2+S 



n 

By using the hyper-spherical change of variables in (12), we get 

[z^+']^ j^J^ K,{F-\p,e))K2{F-\p,e))P^-' dp: 



and, as for the Probit model, we must verify Equation (19). Now analyze 
K2{F~^{p,d)); for any 6, existence of MLE implies the existence of some I 
{1 < I < n), such that Xi{9) > 0, and therefore: 



K,{F-\p,e)) = l[- 



1 



< 



cxp(|xi|pAi) 
1 



1 + exp(|x,|pA;) 
e O (exp(-pAO) . 



(21) 
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Therefore, there exist po, M such that 

< Mexp(-Ai(e)Po)Po - G{e) Vp > pq, 



where G{6) G L^. These computations ahow us to use Equation (20) as a 
sufficient condition to get unbiasedncss, and its proof becomes triviaL 

Finally consider the GARCH model. In this case, Bp is the portion of a 
sphere of radius p defined on the positive orthant. Then, the limit 



lim 



1 



WiF-\p,9))p''de, 



where W was defined in (16 1, should be discussed. Again, an application of 
the dominated convergence theorem leads to the simpler condition 

p-l/ „ o\\ „2 



lim W{F-'{p,e))p'' 

p^-\-oo 



0, 



which is true, since W decays with an exponential rate. 
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